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I.  Introduction 


During  the  period  of  this  grant,  our  detailed  technical  accomplishments  are  reported  through 
journal  articles  and  technical  reports.  Each  of  our  semi-annual  reports  highlight  certain 
technical  areas. 

II.  Energy  Efficient  Implementation  of  DSP  Systems 

Over  the  last  few  months,  we’ve  been  working  on  several  new  approaches  and  tools  for  the 
energy  efficient  implementation  of  DSP  systems.  Specifically  we  have  focused  on  three 
areas  which  include  approximate  filtering  techniques,  power  reduction  in  delay  line 
structures,  and  power  estimation  at  the  Register  Transfer  Level  (RTL)  level. 

We  have  proposed  a  low-power  filtering  technique  in  which  the  number  of  sections  used  in 
a  HR  or  FIR  digital  filter  structure  is  adaptively  changed  in  order  to  dynamically  control  the 
net  stopband  attenuation.  This  technique  is  based  on  an  efficient  measure  of  the 
input/output  power  differential,  and  has  been  shown  empirically  to  be  effective  in  reducing 
power  consumption  without  noticeable  quality  degradation.  Currently  we  are  exploring  the 
theoretical  rationale  and  statistical  properties  of  this  technique,  and  have  shown  that  under 
certain  assumptions  the  technique  converges  to  a  suitably-defined  optimal  number  of 
filtering  sections.  By  developing  a  theoretical  framework  for  analyzing  the  performance  of 
the  low-power  filtering  technique,  we  will  be  able  to  quantify  its  performance  and 
limitations. 

We  have  also  been  working  on  reducing  power  consumption  in  long  delay  line  structures 
commonly  found  in  signal  processing  and  communications  applications  (e.g.,  matched 
filters).  Data  shifting  in  the  delay  lines  can  be  power  hungry  due  to  high  clock  and  register 
power  consumption.  We  have  been  working  on  an  approach  to  significantly  reduce  the 
switched  capacitance  at  a  fixed  power  supply  voltage.  The  basic  idea  involves  using  bi¬ 
level  parallelism  to  reduce  clock  frequencies  without  loss  in  functional  throughput  (i.e.,  the 
I/O  data  rate  is  fixed).  For  N=2,  two  parallel  shift  registers  are  used  each  with  half  the 
original  length  and  clock  frequency.  Without  accounting  for  the  multiplexor  overhead,  this 
results  in  a  factor  of  2  power  reduction  without  loss  in  performance.  In  general,  an  N-fold 
reduction  in  power  is  achieved  for  N-level  parallelism.  Unfortunately,  the  overhead 
circuitry  associated  with  parallelism  (routing,  multiplexors,  control  signal  generation,  etc.) 
limits  the  amount  of  power  reduction  possible  and  results  in  an  "optimum"  level  of 
parallelism.  We  have  performed  layout  of  various  matched  filter  architectures  and 
simulated  power  consumption  using  switch  level  simulators. 

Finally,  we  have  been  involved  with  developing  a  power  estimation  tool  that  works  with  a 
structural  circuit  description.  The  tool  is  capable  of  estimating  circuit  power  dissipation  by 
monitoring  signal  transitions  on  all  circuit  nets  based  on  the  validation  test  vectors  used. 
Unlike  commercial  power  estimators  such  as  PowerMill,  this  tool  does  not  work  at  the 
transistor  level.  It  works  at  the  gate/module  level  and  as  a  result,  large  systems  can  be 
simulated  for  a  big  number  of  input  vectors.  The  goal  is,  however,  not  to  compromise 
estimation  accuracy.  The  user  can  provide  information  about  the  internal  capacitive  nodes 
of  low-level  modules  and  link  them  to  input  transitions.  In  such  a  way,  both  simulation 
speed  and  estimation  accuracy  are  preserved.  Each  net  is  annotated  with  three  different 
capacitance  values  (gate,  junction,  and  routing  capacitance).  The  model  accounts  for  the 
non-linear  capacitance  variations  with  voltage  associated  with  the  gate  and  junction 
components.  Therefore  accurate  power  estimates  at  different  supply  voltages  can  be 
obtained  at  an  RTL  level.  The  extra  transitions  due  to  glitching  are  also  accounted  for  by 
using  a  reasonable  timing  model.  Second  order  effects  such  as  reduced  swing  on  some 
nodes  (e.g.,  pulling  a  node  up  using  a  NMOS  device)  are  also  included.  This  tool  allows  a 
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hybrid  approach  in  which  some  modules  in  the  design  can  be  evaluated  using  simulation 
(e.g.,  a  multiplier  whose  energy  consumption  is  very  strongly  dependent  on  input  patterns) 
and  others  using  high-level  black  box  models  (e.g.,  a  SRAM  array  where  the  energy  per 
access  is  independent  of  data  patterns).  Preliminary  simulation  results  are  very 
encouraging.  The  power  consumption  of  register  and  adder  structures  have  been  simulated 
and  appear  to  be  within  20%  of  SPICE  simulations.  Such  a  tool  can  be  used  to  explore  the 
design  space  at  the  RTL  level,  without  having  to  map  a  design  to  layout  for  accurate  power 
estimation.  After  validating  this  tool  using  larger  design  examples,  we  plan  to  use  it  to 
evaluate  the  power  efficiency  of  various  DSP  filter  architectures. 

III.  Algorithm-Based  Fault  Tolerance 

This  section  describes  our  research  progress  in  the  area  of  Algorithm-Based  Fault 
Tolerance.  Our  recent  work  in  this  area  has  investigated  methods  for  introducing  controlled 
or  systematic  redundancy  into  dynamic  systems  that  implement  signal  processing 
operations.  This  line  of  investigation  originated  in  the  algebraic  setting  of  state  machines 
used  to  implement  group  or  semigroup  computations.  In  such  structures,  the  composition 
of  an  input  with  a  present  state  constitutes  the  computation,  with  the  next  state  being  the 
result,  [1].  Using  the  algebraic  framework  that  was  developed  in  the  thesis  [2],  we  have 
been  able  to  show  that  in  order  to  introduce  redundancy  into  a  given  group  (or  semigroup) 
machine,  we  need  to  design  a  larger  machine  that  performs  a  computation  in  a  larger 
group  (or  semigroup).  The  original  (semi)group  is  mapped  into  this  larger  (semi)group 
through  an  algebraic  homomorphism.  The  rich  theory  of  decomposition  for  machines 
allowed  us  to  take  a  closer  look  at  the  actual  structure  of  the  redundant  machine.  For 
instance,  it  is  known  that  a  group  machine  can  be  decomposed  into  a  subgroup  machine 
and  a  coset  leader  machine,  [1].  In  the  case  of  separate  parity  checks  for  computations  in 
groups,  we  have  been  able  to  show  that  fault  tolerance  is  introduced  by  replicating  the 
coset  leader  machine,  whereas  for  non-separate  checks  we  need  to  decompose  the 
redundant  machine  using  a  subgroup  other  than  the  one  in  which  the  original  computation 
is  performed.  Most  of  the  above  results  can  be  extended  to  the  class  of  group 
homomorphic  systems  studied  in  [3]. 

In  order  to  obtain  more  detailed  and  constructive  results,  our  recent  focus  has  been  on 
linear,  time-invariant  (LTI)  state-space  models,  [4],  [5].  We  anticipate  similar  results  for 
systems  modeled  by  factored  state-variable  representations  [6]  or  signal  flow  graphs, 
which  are  important  in  a  variety  of  signal  processing  tasks.  A  redundant  version  of  a  given 
LTI  state-space  model  is  obtained  by  embedding  it  in  a  larger  model  using  a  “state 
homomorphic  mapping.”  This  mapping  takes  the  states  of  the  original  non-redundant 
system  into  a  larger  redundant  space,  while  encoding  —  within  this  larger  space  —  the 
properties  of  the  original  system.  The  added  redundancy  can  be  used  to  obtain  error 
detection  and  correction  under  hardware  failures. 

More  specifically,  the  state  of  the  redundant  system  at  any  given  time  allows  us  to  calculate 
the  original  state  vector  through  a  linear  mapping.  We  show  that,  in  order  for  this  to 
happen,  it  is  necessary  that  all  the  original  modes  appear  in  the  redundant  system.  In 
addition,  the  redundant  system  has  modes  that  are  observable  but  unreachable  under  fault- 
free  conditions;  these  modes  are  initialized  to  zero  and,  because  they  are  unreachable, 
manifest  themselves  only  if  a  fault  takes  place.  By  detecting  the  presence  of  such 
unreachable  modes,  we  can  detect,  locate  and  subsequently  correct  errors. 

All  fault-tolerant  versions  of  a  given  non-redundant  LTI  state-space  system  can  be  put  into 
a  standard  form  through  similarity  transformations.  In  the  standard  form,  the  redundant 
modes  are  seen  by  inspection  to  be  unreachable,  and  the  coupling  between  the  original  and 
the  redundant  modes  is  unidirectional:  the  original  modes  can  be  affected  by  the  redundant 
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ones,  but  not  vice-versa.  The  coupling  of  the  redundant  modes  to  the  original  ones  is 
unimportant  under  fault-free  conditions,  because  the  redundant  modes  are  never  excited 
under  these  conditions.  However,  the  coupling  can  be  an  important  factor  when 
considering  the  error  detecting  and  correcting  capabilities  of  the  fault-tolerant  design. 

Our  framework  is  general  enough  to  include  previously  developed  fault-tolerant  schemes 
for  LTI  state-space  systems,  such  as  modular  redundancy,  or  the  “checksum”  scheme 
developed  by  Abraham  in  [7j.  However,  we  are  now  also  able  to  develop  more  advanced 
checksum  schemes  that  resemble  linear  error-correcting  codes.  In  such  schemes,  the  parity 
checks  (i.e.,  the  set  of  linear  equations  that  check  whether  the  redundant  modes  have  been 
excited  or  not)  form  a  linear  error  correcting  code.  The  code  allows  one  to  easily  detect  and 
locate  one  or  more  transient  faults  (depending  on  the  number  of  modes  that  have  been 
added  in  the  redundant  system).  Error  correction  is  straightforward,  once  we  locate  the 
erroneous  state  variables.  Given  any  set  of  parity  checks,  one  can  always  use  an 
appropriate  similarity  transformation  in  order  transform  these  checks  to  a  linear  code. 
Assuming  that  a  single  fault  corrupts  a  single  state  variable,  we  can  provide  single  error 
correction  at  the  expense  of  very  few  additional  state  variables.  For  example,  by  adding 
0(log(N))  redundant  variables,  we  provide  single-error-correction  to  a  system  of  order 
N. 

We  are  currently  investigating  the  possibility  of  extending  our  results  to  a  class  of  dynamic 
systems  in  state  form  known  as  “max-plus  systems.”  These  systems  are  nonlinear,  but 
have  some  analogies  with  traditional  LTI  state-space  systems,  in  that  they  are  described  by 
analogous  state  evolution  and  output  equations.  The  only  difference  is  that  regular  addition 
is  replaced  by  the  MAX  operation,  and  regular  multiplication  is  replaced  by  +  (i.e.  regular 
addition).  The  resulting  setting  is  that  of  minimax  algebra,  and  has  been  studied 
extensively,  [8]-[10].  For  example,  MAX(3+x,5)=7  can  be  thought  of  as  a  linear  equation 
in  minimax  algebra  —  corresponding  to  3x+5=7  in  traditional  linear  algebra.  With  this 
type  of  substitution,  a  max-plus  system  looks  like  an  LTI  state-space  model.  Max-plus 
systems  are  used  to  model  a  large  class  of  discrete-event  processes,  with  documented  or 
potential  applications  for  scheduling  and  routing  in  various  types  of  networks  (e.g.,  for 
computing,  signal  processing,  communication,  transportation,  or  manufacturing),  [8]- 


Due  to  the  lack  of  an  inverse  for  the  MAX  operation,  the  introduction  of  redundancy  into  a 
max-plus  linear  system  aims  mostly  at  detecting  (rather  than  correcting)  errors,  and  at 
maintaining  a  desired  level  of  performance  despite  the  existence  of  faults.  For  example,  by 
ensuring  that  strategic  tasks  are  duplicated  by  additional  processors  in  a  signal  processing 
network,  one  could  guarantee  that  the  (main)  functionality  of  the  network  would  be 
maintained  despite  the  breakdown  of  certain  processors.  As  another  example,  in  a  railway 
network  one  hopes  to  ensure  robust  performance  despite  malfunctions  in  some  trains  (or 
stations)  by  introducing  additional  train  links  between  the  most  important  stations. 

We  have  been  able  to  show  that  all  redundant  max-plus  systems  that  can  be  used  to  protect 
a  given  original  system  (again  through  a  “state  homomorphic  mapping”)  are  similar  to  a 
system  in  a  certain  standard  form.  The  standard  system  contains  the  original  state  variables 
intact,  along  with  a  number  of  parity  state  variables.  The  choice  of  these  parity  variables  is 
arbitrary;  the  coupling  between  them  and  the  original  ones,  however,  is  not  necessarily 
unidirectional  (as  was  the  case  for  LTI  state-space  systems)  and  has  to  be  carefully  chosen 
so  that  the  original  variables  evolve  in  the  same  way  as  they  would  in  the  original  system. 
Using  this  framework,  we  have  developed  new  examples  of  fault-tolerant  max-plus 
systems.  Most  of  these  examples  are  based  on  state  variable  replication,  a  scheme  in  which 
we  selectively  replicate  some  of  the  most  important  state  variables. 
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We  expect  to  be  looking  further  at  ways  to  achieve  robust  performance  in  max-plus  linear 
systems  and  other  dynamic  systems,  and  are  optimistic  that  our  paradigm  will  be  fruitful  in 
these  other  contexts. 
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